Goto

Collaborating Authors

 Promising Solution


Solving Most Systems of Random Quadratic Equations

Neural Information Processing Systems

This paper deals with finding an $n$-dimensional solution $\bm{x}$ to a system of quadratic equations $y_i=|\langle\bm{a}_i,\bm{x}\rangle|^2$, $1\le i \le m$, which in general is known to be NP-hard. We put forth a novel procedure, that starts with a \emph{weighted maximal correlation initialization} obtainable with a few power iterations, followed by successive refinements based on \emph{iteratively reweighted gradient-type iterations}. The novel techniques distinguish themselves from prior works by the inclusion of a fresh (re)weighting regularization. For certain random measurement models, the proposed procedure returns the true solution $\bm{x}$ with high probability in time proportional to reading the data $\{(\bm{a}_i;y_i)\}_{1\le i \le m}$, provided that the number $m$ of equations is some constant $c> 0$ times the number $n$ of unknowns, that is, $m\ge cn$. Empirically, the upshots of this contribution are: i) perfect signal recovery in the high-dimensional regime given only an \emph{information-theoretic limit number} of equations; and, ii) (near-)optimal statistical accuracy in the presence of additive noise. Extensive numerical tests using both synthetic data and real images corroborate its improved signal recovery performance and computational efficiency relative to state-of-the-art approaches.


Large-Scale Price Optimization via Network Flow

Neural Information Processing Systems

This paper deals with price optimization, which is to find the best pricing strategy that maximizes revenue or profit, on the basis of demand forecasting models. Though recent advances in regression technologies have made it possible to reveal price-demand relationship of a number of multiple products, most existing price optimization methods, such as mixed integer programming formulation, cannot handle tens or hundreds of products because of their high computational costs. To cope with this problem, this paper proposes a novel approach based on network flow algorithms. We reveal a connection between supermodularity of the revenue and cross elasticity of demand. On the basis of this connection, we propose an efficient algorithm that employs network flow algorithms. The proposed algorithm can handle hundreds or thousands of products, and returns an exact optimal solution under an assumption regarding cross elasticity of demand. Even in case in which the assumption does not hold, the proposed algorithm can efficiently find approximate solutions as good as can other state-of-the-art methods, as empirical results show.


Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

Neural Information Processing Systems

Estimators of information theoretic measures such as entropy and mutual information from samples are a basic workhorse for many downstream applications in modern data science. State of the art approaches have been either geometric (nearest neighbor (NN) based) or kernel based (with bandwidth chosen to be data independent and vanishing sub linearly in the sample size). In this paper we combine both these approaches to design new estimators of entropy and mutual information that strongly outperform all state of the art methods. Our estimator uses bandwidth choice of fixed $k$-NN distances; such a choice is both data dependent and linearly vanishing in the sample size and necessitates a bias cancellation term that is universal and independent of the underlying distribution. As a byproduct, we obtain a unified way of obtaining both kernel and NN estimators. The corresponding theoretical contribution relating the geometry of NN distances to asymptotic order statistics is of independent mathematical interest.


Dialog-based Language Learning

Neural Information Processing Systems

A long-term goal of machine learning research is to build an intelligent dialog agent. Most research in natural language understanding has focused on learning from fixed training sets of labeled data, with supervision either at the word level (tagging, parsing tasks) or sentence level (question answering, machine translation). This kind of supervision is not realistic of how humans learn, where language is both learned by, and used for, communication. In this work, we study dialog-based language learning, where supervision is given naturally and implicitly in the response of the dialog partner during the conversation. We study this setup in two domains: the bAbI dataset of (Weston et al., 2015) and large-scale question answering from (Dodge et al., 2015). We evaluate a set of baseline learning strategies on these tasks, and show that a novel model incorporating predictive lookahead is a promising approach for learning from a teacher's response. In particular, a surprising result is that it can learn to answer questions correctly without any reward-based supervision at all.


Data-Efficient Hierarchical Reinforcement Learning

Neural Information Processing Systems

Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control. For generality, we develop a scheme where lower-level controllers are supervised with goals that are learned and proposed automatically by the higher-level controllers. To address efficiency, we propose to use off-policy experience for both higher-and lower-level training.


BRITS: Bidirectional Recurrent Imputation for Time Series

Neural Information Processing Systems

Time series are widely used as signals in many classification/regression tasks. It is ubiquitous that time series contains many missing values. Given multiple correlated time series data, how to fill in missing values and to predict their class labels? Existing imputation methods often impose strong assumptions of the underlying data generating process, such as linear dynamics in the state space. In this paper, we propose BRITS, a novel method based on recurrent neural networks for missing value imputation in time series data.



Learning Credal Ensembles via Distributionally Robust Optimization

Wang, Kaizheng, Faza, Ghifari Adam, Cuzzolin, Fabio, Chau, Siu Lun, Moens, David, Hallez, Hans

arXiv.org Machine Learning

Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.


Towards Anytime-Valid Statistical Watermarking

Huang, Baihe, Xu, Eric, Ramchandran, Kannan, Jiao, Jiantao, Jordan, Michael I.

arXiv.org Machine Learning

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach for selecting sampling distributions and the reliance on fixed-horizon hypothesis testing, which precludes valid early stopping. In this paper, we bridge this gap by developing the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference. Unlike traditional approaches where optional stopping invalidates Type-I error guarantees, our framework enables valid, anytime-inference by constructing a test supermartingale for the detection process. By leveraging an anchor distribution to approximate the target model, we characterize the optimal e-value with respect to the worst-case log-growth rate and derive the optimal expected stopping time. Our theoretical claims are substantiated by simulations and evaluations on established benchmarks, showing that our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.